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^5 ' Abstract 

This paper presents the first numerical implementation and tests of the Lagrangian- 
averaged Navier-Stokes-alpha (LANS-a) turbulence model in a primitive equation 
| ocean model. The ocean model in which we work is the Los Alamos Parallel Ocean 
Q\ ■ Program (POP); we refer to POP and our implementation of LANS-a as POP-a. 

Two versions of POP-a are presented: the full POP-a algorithm is derived from 
(^*> • the LANS-a primitive equations, but requires a nested iteration that makes it too 
slow for practical simulations; a reduced POP-a algorithm is proposed, which lacks 
- — , \ the nested iteration and is two to three times faster than the full algorithm. The 
q ■ reduced algorithm does not follow from a formal derivation of the LANS-a model 
"55 ■ equations. Despite this, simulations of the reduced algorithm are nearly identical to 
the full algorithm, as judged by globally averaged temperature and kinetic energy, 
and snapshots of temperature and velocity fields. Both POP-a algorithms can run 
stably with longer timesteps than standard POP. 

Comparison of implementations of full and reduced POP-a algorithms are made 
within an idealized test problem that captures some aspects of the Antarctic Cir- 
& \ cumpolar Current, a problem in which baroclinic instability is prominent. Both 
POP-a algorithms produce statistics that resemble higher-resolution simulations of 
standard POP. 

A linear stability analysis shows that both the full and reduced POP-a algorithms 
benefit from the way the LANS-a equations take into account the effects of the small 
scales on the large. Both algorithms (1) are stable; (2) have an effective Rossby 
deformation radius that is larger than the deformation radius of the unmodeled 
equations; and (3) reduce the propagation speeds of the modeled Rossby and gravity 
waves relative to the unmodeled waves at high wave numbers. 
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1 Introduction 



Ocean-climate models are typically run at relatively low resolution (1° or ~ 100km grid cells) in 
climate simulations due to the computational requirements of the coupled components and duration 
of the simulations, which might last for hundreds or thousands of model years. This resolution is well 
above the Rossby radius of deformation over most of the ocean domain, the typical horizontal size of 
eddies in the ocean. As a result, ocean-climate simulations only include the mean, large-scale flow, 
and not the eddies one might observe in satellite images. These eddies affect the mean circulation 
by transporting buoyancy and kinetic energy. Recent ocean-only simulations at resolutions of 1/10° 
and finer confirm that when eddy-scale dynamics are resolved, some of the more prominent biases in 
the mean circulation, such as the well known biases in Gulf Stream path and structure, are greatly 
reduced [1]. 

The goal of turbulence modeling is to capture the effects of small scale structures on the large-scale 
flow. In the case of ocean-climate models, this need is particularly pressing because the Rossby 
radius, the length scale where available potential energy is converted into kinetic energy, is not 
resolved except in the equatorial region. This leaves not just part of the mesoscale eddy spectrum 
unresolved, but all of that spectrum unresolved except perhaps in the region in which the equatorial 
jets occur. Thus, parameterization of those effects becomes a necessity. 

A particular parameterization of scalar transport, following the approach of Gent and Mc Williams 
(GM) [2] and extensions thereof [3,4] has been widely embraced in ocean modeling. These schemes 
involve two sources of scalar mixing. One of these is similar to horizontal mixing by diffusion, but 
its diffusivity is rotated slightly from the horizontal into a surface along which potential density is 
constant. The other source of scalar mixing involves the diffusion of layer thickness, which acts to 
flatten surfaces of constant density (isopycnal surfaces). This flattening of density surfaces releases 
potential energy and, thus, mimics the action of eddies generated through baroclinic instability. 
(The scalars in question here include heat and salt, the constituents of density.) This isopycnal 
mixing scheme, first applied to coupled atmosphere-ocean models in the Community Climate System 
[5], represents one of the more significant advances not only in ocean modeling, but more generally 
in climate. Implementing this parameterization allows coupled atmosphere and ocean models to be 
configured without the "flux corrections" formerly required in order to compensate for incompatible 
heat transports in oceanic and atmospheric components. 

A number of important mean flow features in the oceans, however, remain poorly represented in 
most ocean models. These flow features only come into focus in eddy-resolving models which remain 
too expensive for long time scale climate simulation. This resistance to parameterization by scalar 
transport on isopycnal surfaces persists in the Gulf Stream/North Atlantic Current system and in a 
few other regions. This is probably because the process through which the variability in the circulation 
feeds back on the mean is dynamically more complex than what can be adequately parameterized by 
diffusion of layer thickness. Hence, although the very serious issue of spurious cross-frontal diffusion 
of heat (the "Veronis effect" [6,7]) is greatly reduced by using the GM parameterization, the path 
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and structure of the Gulf Stream and its downstream development remain strongly biased. 

In this paper we consider for the first time a non-diffusive parameterization that primarily modifies 
the momentum equations. We anticipate that this parameterization will prove complementary to the 
scalar-transport-based parameterizations currently in use, a subject for later investigation. 

We have implemented a parameterization of mean multiscale transport effects known as the Lagrangian- 
Averaged Navier-Stokes alpha (LANS-a) model in a primitive equation ocean model. The LANS-a 
model is a member of a hierarchy of equations developed using asymptotic methods and Lagrangian- 
averaging in Hamilton's principle [8]. These equations have desirable characteristics, such as con- 
servation of energy and potential vorticity in the absence of dissipation. They also satisfy Kelvin's 
circulation theorem and conserve a form of potential vorticity. 

The LANS-a model is implemented by using two velocities: the Lagrangian-averaged velocity and 
the Eulerian-averaged velocity. The Eulerian-averaged velocity is related to the Lagrangian-averaged 
velocity through a smoothing operation. This smoother Eulerian velocity, which does not include 
the smallest scale variability, appears as the advecting velocity for both momentum and tracers. 
The Lagrangian-averaged (non-smoothed) velocity, which does include mean effects of the smaller 
scales, is transported by this smoother advecting velocity. The parameter alpha is a length scale that 
determines the amount of smoothing. Structures smaller than alpha are slaved to the larger structures 
in the Eulerian-averaged velocity. Solutions of the LANS-a equations converge to the Navier-Stokes 
solutions in the limit as alpha goes to zero, and the LANS-a equations are Galilean invariant. Note 
that the LANS-a equations change the nonlinear terms of the Navier-Stokes equations, rather than 
the dissipative terms, as most turbulence models, such as hyperviscosity, do. In other words, LANS-a 
is a model of stirring — the purely mechanical movement and rearrangement of fluid — rather than 
mixing, which simply occurs through diffusion (see, e.g. [9]). The LANS-a model also tempers vortex 
stretching while preserving the original form of the vortex dynamics [10]. This can be seen with the 
vorticity equation, where the vortex stretching term contains the smoothed velocity. The LANS-a 
equations may be interpreted as a closure of a viscous version of the GLM equations of Andrews and 
Mclntyrefll]. 

The alpha-parameterization was introduced within inviscid equations by Holm, Marsden and Ratiu 
[12]. Soon afterwards, its viscous version, LANS-alpha, was developed into a turbulence model by 
Chen et al. [13,14]. See also Marsden and Shkoller [15] for parallel developments. Its performance as 
a model of turbulence and mixing was tested in a variety of observations and numerical simulations. 
A summary of results obtained between 1997 and 2004 can be found in [16]; only the highlights are 
reviewed here. In high-Reynolds-number pipe flow, its steady analytical solutions matched observa- 
tions of mean velocity over several orders of magnitude, demonstrating its potential for describing 
the mean properties of turbulence over a wide range of scales [13]. In forced isotropic turbulence in 
a three-dimensional periodic domain, the energy spectrum E{k) was found to be proportional to the 
expected k~ 5 ^ 3 for ka < 1 [14]. However, for ka > 1 the energy spectrum turns over to approximate 
k~ 3 , thereby greatly reducing the number of degrees of freedom excited in the flow. This occurs be- 
cause the scales smaller than a (that is, ka > 1) are slaved to the larger scales. This slaving greatly 
reduces the computational work required in a numerical fluid simulation. In fact, the computational 
work for a turbulence simulation scales with the Reynolds number Re as Re 3 for Navier-Stokes and 
as Re 2 LANS-a. The LANS-a model was found to be at least as accurate as modern Large Eddy 
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Simulation (LES) models (such as the dynamic models in [17,18]) when measured against a reference 
solution in a high-resolution direct numerical simulation of Kelvin-Helmholtz shear layer instability 
[19]. 



Recent work has extended LANS-a to geophysical fluid dynamic models. For example, a quasi- 
geostrophic alpha model, developed by Holm and Nadiga [20], showed that the alpha model can 
capture the qualitatively correct time-mean circulation in the double-gyre problem at a resolution 
four to eight times coarser than a traditional quasi-geostrophic model. The shallow water LANS-a 
equations have been shown to allow larger time steps than the unaveraged shallow water equations 
[21]. A linear analysis leads to the understanding that this is due to a slow down of the frequency of 
the high wave-number gravity waves. As the grid is refined, the time step becomes independent of 
the mesh spacing and dependent on a. The LANS-alpha model's method of accounting for the effect 
of the small scales on the large impacts the way it models baroclinic instability in a two-layer QG 
model; it lowers the critical wavenumber, reduces the bandwidth of the instability, and preserves the 
value of forcing at the onset [22]. 



Our work presents the first implementation of the LANS-a model in a primitive equation ocean 
general circulation model, and therefore the first that could potentially be used for century to 
millennium scale climate projection. Implementation of LANS-a in Los Alamos' Parallel Ocean 
Program (POP) presented numerous new challenges, including most notably accommodating the 
barotropic/baroclinic mode splitting that is performed in ocean models of this class. This decom- 
position of the momenta into a depth-averaged barotropic component and the three dimensional 
baroclinic residual is responsible for a gain of efficiency on the order of one to two orders of magni- 
tude under explicit or semi-implicit time-stepping, and so must be preserved, despite the complication 
presented to implementation of LANS-a. Other challenges involved enforcement of continuity on the 
Eulerian-averaged velocity in the barotropic equations; correct handling of boundaries and topogra- 
phy; and establishment of an efficient algorithm that retains the essential properties of the LANS-a 
model. 



The organization of this paper is as follows: after reviewing the LANS-a model in §2, the POP-a 
algorithm is derived from the LANS-a primitive equation set in §3. A reduced version of the full 
POP-ct algorithm, which is faster by a factor of two to three, is presented in §3.5. The stability 
analysis in §4 shows that the effect of the LANS-a model is to increase the effective Rossby radius 
of deformation, lowering the wavenumber at which the onset of baroclinic instability occurs, and to 
slow down gravity waves and Rossby waves at high wave numbers. In this section we also show that 
the reduced algorithm does not change the character of the continuous equations very much, and 
that both the full and reduced algorithms are stable. Simulations using an idealized channel domain 
that invokes baroclinic instability are presented in §5. They show that POP-a statistics resemble 
higher-resolution simulations of standard POP, and that results from the full and reduced POP-a 
algorithms are nearly identical. Section §6 states our conclusions. 
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2 LANS-a model equations 



2.1 The LAN S-a model 



The nondissipative part of the LANS-a equations is obtained using the Euler-Poincare variational 
framework, where the Lagrangian expression of Hamilton's principle is obtained for each approxima- 
tion of the full equations of motion, such as shallow water, quasi-geostrophic, Boussinesq, and primi- 
tive equations [8,23]. The effect of this procedure is that two fluid velocities appear: the Lagrangian- 
averaged velocity, v, which is the momentum velocity, and the Eulerian-averaged velocity, u, which 
is the advecting velocity. In the original derivation, the Lagrangian-averaged and Eulerian-averaged 
velocities are related by the Helmholtz operator, 



where the displacement fluctuation £ = X — (X) is the difference between the spatial trajectory 
of a fluid parcel, X, and the Lagrangian mean of that fluid trajectory, (X). The covariance of the 
displacement fluctuation, (££), is a three- by-three symmetric tensor (in three-dimensional turbulence) 
that describes the small-scale (or short-time) fluctuations in the fluid flow. This tensor varies in both 
space and time. The size of the fluctuations described by is determined by the time-scale of the 
Lagrangian mean time-averaging operation (•), as described by Holm [8], p. 217. It is possible to 
replace (1) by a convolution (and thus, a filter), as discussed in Geurts and Holm [24]. 

The Lagrangian mean equations include a prognostic equation for each of the components of (££), 
which are then used in (1) and the Navier-Stokes equation. The additional memory and computation 
required by this fully dynamic covariance tensor could be prohibitively expensive for computational 
fluids models. In the original LANS-a model, the covariance displacement fluctuation tensor (££) 
is simplified to the scalar a 2 by assuming that the displacements are spatially isotropic, uniform 
throughout the domain, and constant in time. These are severe restrictions to the original Lagrangian 
mean equations. Nevertheless, the resulting LANS-a equation set has been shown to be an effective 
turbulence model [13,20]. 



2.2 The primitive equation LANS-a model 



The implementation of the LANS-a model in POP described in this work uses the primitive-equation 
form ([23], section 6.3): 




(1) 
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<9v 1 

— + u ■ Vv + u 3 d z v + VjVuj + f x u = Vvr + T (v) (2) 

at po 

Tz = ~ pg > (3) 

V • u + 8 z u 3 = 0, (4) 

v=(l-a 2 (V 2 + <9 2 )) U , (5) 



where v = (i>i,i>2) and u = (u 1 ,u 2 ) are horizontal velocities and V is the horizontal gradient. 
The Helmholtz operator in (5) indicates that the Eulerian-averaged velocity u is smoother than 
the Lagrangian-averaged velocity v. In the Lagrangian mean equations, the covariance displacement 
fluctuation tensor is determined prognostically by the equations. In the LANS-a model, alpha 
takes the place of this tensor; this length scale alpha is the only user-defined parameter in the model. 
Alpha determines how smooth u is; more precisely, structures in v which are smaller than alpha are 
strongly suppressed by the inverse Helmholtz operator and do not appear in u. In practice alpha 
is chosen to be on the order of a grid-cell width. An important question is how well the effects of 
turbulence are represented in the particular context of a primitive equation ocean model as alpha is 
varied; this is addressed in [25]. 

A fundamental characteristic of the LANS-a equations is that the smoother Eulerian-averaged ve- 
locity u is the advecting velocity. Thus the small-scale structures in v are advected by the smoother 
velocity u. However, this advection is not like passive tracers embedded in the flow. Rather, it is fluid 
transport as in Kelvin's circulation theorem, where the line element with respect to which the circula- 
tion is defined must also be advected. This is the source of the additional term Y%=i v j^ u j = (Vu) T -v 
in the motion equation for the alpha model. 

The diffusion operator T acts on the Lagrangian-averaged velocity v, so (2) can be thought of as 
an advection-diffusion equation for v, and v as the specific momentum. Additional terms are the 
Coriolis force f x u, the pressure gradient po _1 V7r, and the additional nonlinear term, Y^j=i v jVuj. 
In the absence of this extra nonlinear term, the LANS-a model reduces to the Leray model [8]. A 
comparison of the LANS-a and Leray models will be presented in a future work. 

The pressure it is not the physical pressure but a modified pressure, where 

7r = p-i|u| 2 -y|Vu| 2 (6) 

and p is the actual pressure. However, in the algorithms presented in this paper, the modified pressure 
7r is never calculated this way. Instead, it is computed at each time step from the hydrostatic equation 
(3) once the density at that step is known. This is identical to the way the physical pressure is 
computed in POP. 

The LANS-a version of tracer equations in a primitive-equation ocean model is 

^ + u-V^ = PM, (7) 
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where ip may be temperature, salinity, or another conserved quantity, and D is the horizontal and 
vertical tracer diffusion. The tracer advecting velocity is the smooth velocity u. This is the velocity 
responsible for transport of fluid particles carrying mass and salinity. This is not a model choice; rather 
it is part of the derivation of the alpha model. Before their diffusivities were introduced as kinetic 
coefficients, the tracer equations described quantities such as heat and salinity that were simply 
carried along with each fluid parcel. The Lagrangian average of such a quantity that is conserved 
on fluid parcels is simply its average value, computed at fixed Lagrangian coordinate. Likewise, the 
Lagrangian averaged equation for such a process keeps the same form as the exact dynamics, but the 
transport velocity in this equation is replaced by the Lagrangian averaged particle velocity (u) and 
the instantaneous value of the tracer is replaced by its average value taken along the trajectory of the 
fluid parcel. The value of the diffusivity of such a Lagrangian averaged tracer is a model parameter. 
However, the form of its dynamics was fixed by the use of the Lagrangian average in the derivation 
of the alpha model. 



3 Implementation of LANS-q in POP 

3.1 POP barotropic /baroclinic splitting 

The Parallel Ocean Program (POP), developed at Los Alamos National Laboratory by Smith, Dukow- 
icz and Malone [26], is well-known in the ocean modeling community. As the ocean component of 
NCAR's Community Climate System Model [27] and NCAR's Parallel Climate Model, it is used for 
twenty-first century climate simulations by the Intergovernmental Panel on Climate Change (IPCC). 

POP uses finite differences to discretize the primitive equations, and has a leap-frog time stepping 
scheme. The horizontal grid is a logically rectangular B-grid, the vertical coordinate is z-level and 
incorporates a free surface [28] and optional partial bottom cells [29]. Diffusion can be Laplacian or 
biharmonic, and the Gent-Mc Williams scheme is available for tracer diffusion [2]. 

The barotropic/baroclinic mode splitting used in ocean general circulation models presents a signif- 
icant complication to the implementation of LANS-a. The mode splitting is performed in order to 
isolate the very fast free-surface gravity waves, which have speeds in excess of 200 m/sec. These fast 
external waves are responsible for the very rapid propagation of tsunami events, for example, but have 
negligible bearing on ocean circulation. The problematic external mode is almost completely isolated 
by a modal decomposition into the vertically averaged velocity and the departure from that average, 
with internal waves speeds of a few meters per second, at most, presenting a much less stringent 
time step limitation within the 3-dimensional baroclinic equations. The much faster external mode is 
treated by solving the barotropic equations with an implicit method in POP; some other models use 
an explicit subcycling of the barotropic equations. In either case the isolation of the external mode 
within a lower-dimensional set of equations results in a tremendous gain in efficiency. It is within 
these split equations that the model is implemented in order to arrive at a primitive equation ocean 
model containing the LANS-a turbulence scheme in a form suitable not just for the very limited 
research application that could be addressed with a simpler unsplit model, but for consideration in 
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climate modeling. 



In POP, the barotropic velocity \J(x, y, t) = (Ui, f/ 2 ) is denned as the vertical integral of the horizontal 
velocity u(x,y,z, t) = (1*1,1*2), 



U = / u dz, (8) 

H + 7] J-H 

where H(x, y) is the ocean depth when the surface is at rest and rj(x, y, t) is the free surface height. 
Subscripts are used on U and u to indicate horizontal directions because we reserve V and v for the 
rough velocity in the LANS-a model. 

The continuity equation may now be integrated in the vertical to produce a prognostic equation for 
the free surface height, 



f (V-u + d z u 3 )dz = 0, (9) 

J — H 

^ + V-(H + r ] )U = 0. (10) 



An outline of the POP algorithm for the momentum equation using the baroclinic/barotropic splitting 
at time step n + 1 is presented graphically in Fig. 1 and as follows: 

(PI) Baroclinic component 

leapfrog step: u™ +1 = u]J + 2AtRHS%, k = 1 . . . km 

where RHS^ contains the momentum forcing: advection, centrifugal, Coriolis, diffusion, and 

pressure gradients. 
(P2) Subtract depth-average: 

u^u^-iE^u^ 1 ^, k = l...km 
(P3) Barotropic component: 

compute r] n+1 using the conjugate gradient method. 

leapfrog step: U n+1 = U^ 1 + RHS n 

where RHS n contains Vr/, the Coriolis term, and the vertically integrated forcing. 
(P4) Add baroclinic and barotropic velocities: 
u£ +1 = u n k +l + U n+1 , k=l...km 

In this notation, u£ is the baroclinic horizontal velocity at the kth vertical level at time step n, for 
levels k = 1 . . . km, At is the time step, and u is an intermediate baroclinic velocity. This outline 
contains only enough detail for the purposes of this paper; a complete description can be found in 
the POP reference manual [29]. 
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3.2 POP-a baroclinic implementation 



Because POP's baroclinic and barotropic components use explicit and implicit time stepping, respec- 
tively, the implementation of the alpha model in POP was quite different in these two parts of the 
code. The explicit baroclinic part was straightforward, while the implicit barotropic part presented 
numerous difficulties, as detailed in the following section. 

When POP is adapted to use the LANS-a model (2-5), the following changes must be made: (1) there 
are two full velocity fields, the Eulerian-averaged, or smooth, velocity u and the Lagrangian-averaged, 
or rough, velocity v; (2) likewise, there is a smooth and rough barotropic velocity, U and V; (3) the 
appropriate velocities must be used in each term in the momentum equation; (4) the nonlinear term 
Vu T • v is added to the forcing terms in the momentum equation; and (5) the advecting velocity in 
the tracer equation is the smooth velocity u. 

Because the time derivative in the LANS-a momentum equation (2) is on the rough velocity v, one 
must take a time step in v and then compute u by inverting the Helmholtz operator (5), i.e. 



r n+l 



vr 1 



2At 



+ u£ ■ Vv£ + ul k d z v% 



u 



n+l 



a 2 V 2 



i=i 

Po 

-i i -| 

k = 1 . . . km 



n+l 



(11) 

(12) 



where \ k and u k are the rough and smooth horizontal velocities at time step n and vertical level k, 
v™ k is the j th component at level k, and V is the horizontal gradient. 

The outline for the POP-a algorithm is then 



(Al) Baroclinic component 



leapfrog step: v 



n+l 



2AtRHS%, 



k = 1 . . . km 



where RHS k contains the momentum forcing: 



extra nonlinear term u™ k Vv™ k 



centrifugal, Coriolis, diffusion, and pressure gradients. 
(A2) Subtract depth-average from v: 



advection, 



~n+l 



r n+l 



iE^ivf 1 ^, k = l...km 



(A3) Solve for smooth baroclinic velocity: 



u 



(A4) Subtract depth-average from u: 



-n+l 



U 



U 



n+l 



jjT^xK +1 dz K , k = \...km 



(A5) Barotropic component: 

compute rj n+l using the conjugate gradient method, 
leapfrog step: V n+1 = V" 1 + RHS n 
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where RHS n contains V77, Coriolis term, and the vertically integrated forcing. 



(A6) 



solve for smooth barotropic velocity, U™ +1 



(A7) Add baroclinic and barotropic velocities: 
v™ +1 = v? +1 + V n+1 , k = l...km 



u£ +1 = u™ +1 + U n+1 , k = 1 . . . km 



The boxes indicate steps which were added to POP to implement the LANS-a model. Velocities 
v, u, and u are intermediate baroclinic velocities. Fig. 2 shows a graphical version of the POP-a 
algorithm. The depth-average must to be subtracted from v and then again from u in steps (A2) 
and (A4) to guarantee that both have zero barotropic component. 

Note that in the Helmholtz operator (1 — a 2 V 2 ) \ the smoothing is in the horizontal directions only. 
This is a reduction from the original LANS-a formulation (5), which includes a vertical component. 
This reduction was chosen not only to reduce computational time, but because it is numerical res- 
olution in the horizontal plane which limits physical processes at the scale ofthe Rossby radius of 
deformation and presents the opportunity for effective use of the turbulence model. 



The extra nonlinear term — the fourth term in (11) — only includes a summation from j = 1...2 
rather than j = 1 ... 3 as in the original LANS-a equations. The missing term, M3VW3, involves only 
vertical velocities and is several orders of magnitude smaller that the first two terms in the summation 
in primitive equation applications. If this term were included a smoothed vertical velocity M3 would 
have to be computed at each horizontal level, adding to the overall computational cost for this 
unimportant term. 



Simulations where LANS-a was implemented in only the baroclinic component (A1-A4) were imme- 
diately unstable, failing to converge within 100 timesteps. Those with LANS-a only in the barotropic 
component (A5-A6) ofthe algorithm were usually unstable, and failed to converge after 10 5 timesteps 
(30 years). Thus LANS-a must be uniformly implemented in both components. 



3.3 POP barotropic implementation 



The abbreviated description of the barotropic component in steps 5 and 6 on the POP-a algorithm 
above does not reveal the full complexity of these steps. Essentially, one needs to find V n+1 , U n+1 , and 
rj n+l from the vertically integrated momentum and continuity equations by the end of the procedure. 
However, both of these equations depend on the velocities and surface height, so there are potentially 
two simultaneous implicit solves. 

To review this difficulty in more detail, we step back from the LANS-a model and review the POP 
barotropic implementation by Dukowicz and Smith [28] . The leapfrog discretization of the barotropic 
momentum equation and vertically integrated continuity equation are: 



10 



Tjn+1 _ Tjrc-l 

2At + BU« = + G», (13) 

' - ' + V ■ #U e = 0, (14) 

where g is gravitational acceleration, G n = (G™, G™) are the a; and y components of the vertically 
integrated baroclinic momentum forcing terms, and the Coriolis parameter is written as the matrix 



B 



-/ 

/ 



(15) 



The superscripts (, and 9 are parameters which specify the degree to which the Coriolis term, 
pressure gradient, and continuity are implicit: 



v e = £'u»+i + (j _ £/ _ 7 ')u« + yu 71 - 1 , (16) 

vt = tv n+1 + (l-Z-l)v n + lV n -\ (17) 
U e = 9U n+1 + (1 - #)U n . (18) 

In the POP code, the pressure splitting is equally weighted (£ — 7 — 1/3) and the divergence is 
fully implicit (8 = 1). These choices are the optimal choice of parameters to damp the computational 
gravity wave mode ([28], Fig. 2). One may specify implicit Coriolis (£' = 7' = 1/3) or centered 
explicit Coriolis (£' = 7' = 0) in the POP input file. The advantage of an implicit Coriolis term is 
that a larger time step may be taken, sometimes by a factor of two. The POP-a algorithm described 
in this paper uses an explicit Coriolis term because the implicit version would require an iterative 
method to solve the barotropic momentum equation. 

Using £ = 7 = 1/3, 9 = 1, and explicit Coriolis (£' = 7' = 0), equations (13 — 14) become 



U 



n+1 



m-1 



2 



G n - BU n - 07V (v n+1 +V n + V"' 1 



n+1 



V-HU 



n+1 



0. 



(19) 
(20) 



where r = 2At. These equations are both implicit and so must be solved simultaneously for U n+1 
and r] n+1 . Substituting U n+1 from (19) into (20) and solving for the pressure, we have 



V-ifV- 



-,n+l 



19T 



19 T 



jrg 



-V ■ tfU™- 1 



G" - BU n - # 7 V (rj n + r/ n_1 



+— V-H 

91 

U n+1 = U^ 1 + r\G n - BU n - 27V (r/ n+1 + 7] n + 7] 71 - 1 



[21) 
'22) 
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Thus the algorithm for solving the barotropic equations with explicit Coriolis terms is to solve (21) 
for 7] n+1 and then (22) for U n+1 . Solving for r/ n+1 requires the inversion of a Laplacian-like operator; 
POP uses a conjugate gradient (CG) routine. 



3.4 Full POP -a barotropic implementation 



The POP-a version of the barotropic equations (13-14) are: 



yn+l _ yn-1 + = _ TgVr f + rQ n ? 

- (V n+1 - rf) + V • H U e = 0, 
Using the same weighting choices as POP with explicit Coriolis terms, we have 



(23) 
(24) 
(25) 



yn+l _ y 



n-1 



G" - BU n - 73V (V l+1 + r] n + rf- x 
+1 - rf] + V ■ HU n+1 = 0. 



(26) 
(27) 



To solve equations (25—27) for V n+1 , U n+1 , and r] n+1 , replace V n+1 and V"" 1 in (26) with U n+1 
and U n_1 using the Helmholtz relation, solve for U n+1 , substitute into (27), and solve for r) n+1 . The 
algorithm is then 



Full POP-a barotropic algorithm: 



(v.ir(i 



a 2 V 2 



2T + 



n-1 



+— V ■ H ( 1 - a 2 V 2 



£7 

-y""+l _ -yn—1 



+ r 



G" - BU n - ig V (r] n+1 + 7] n + 7] n - 1 



U 



n+l 



1 - a 2 V 2 



V 



n+l 



(28) 
(29) 
(30) 



The added complexity of the algorithm due to the LANS-a model in these equations is the smooth- 
ing step wherever the (1 — a 2 V 2 ) 1 operator appears. This Helmholtz inversion is solved using the 
iterative CG method. The operator in front of r/ n+1 in (28) is also inverted using the CG method. 
This means that solving the full algorithm requires solving a nested inversion. There are typically 50 
iterations of CG in standard POP, so here there might be 50 2 iterations to solve for r] n+1 when the 
Helmholtz inversion is used. 
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Another option is to use a simple filter such as nearest-neighbor averaging instead of a Helmholtz 
inversion to avoid the nested iteration. Filters have been successfully used in LANS-a models in large 
eddy simulations by Geurts and Holm [30]. Using a local filter in place of the Helmholtz inversion 
improves speed, but the filter must still be applied within each CG iteration for the r] n+1 solve. 



The best option would be to avoid the smoothing operator on the LHS of (28) altogether. This is 
the motivation for the reduced algorithm presented in the next section. Data on the computing time 
for these different options are discussed in section 5.2 



Another version of the algorithm (28 — 30) would be to replace (30) with 



U 



ra+l 



u 



n— 1 



+r (l - a 2 V 2 



G n - BU" - gi V [r] n+i + rj n + ri 



,n-l 



(31) 



which can be obtained by combining (29), (30), and (25). Although this is a valid derivation of the 
model equations, it was found to be unstable in practice. In (31), U n+1 is calculated from U n_1 using 
a pressure-averaged leap-frog time step. In practice this allows the smooth velocity U to drift away 
from the rough velocity V: Numerical experiments where this algorithm is used proceed as expected 
initially, but after five or ten years (order 10 6 time steps) U is seen to depart greatly from V. This 
drift is avoided by using (30), with U computed as a smooth version of V at every step. 



There are two types of variables that are smoothed in the full POP-a algorithm: the velocity in 
(30) and the pressure gradients in (28). The form of the pressure gradient term is (1 — a 
which comes directly from the derivation of the equations and was used in the algorithm. This form 
requires boundary conditions for the pressure gradient on the boundary, which are zero. In a periodic 
domain, the order of the filter and gradient could be changed, that is, V(l -a 2 V 2 ) rj could be 
used instead. This change in the operator order is not possible with solid boundaries, as then the free 
surface height r] must be specified at the boundary, and it is unknown. Thus the Helmholtz inversion 
must be applied to the pressure gradient, not to the pressure. 



3.5 Reduced POP-a barotropic implementation 



The full algorithm, (28 — 30), is an exact derivation of the LANS-a primitive equations, but has the 
overwhelming disadvantage that it is extremely slow due to the Helmholtz inversion on the LHS of 
(28). In this section we test a reduced algorithm that does not include this inversion, and find that 
numerical experiments of this reduced algorithm are almost identical to the original. 

First, we must review the barotropic POP algorithm as it is actually implemented in the code: 
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n-l 



u = u 



V-ifV 



G n - BU n - 7^V (rf + 2r/ n " 1 
! r rf + V • H i 



,n+l 



U n+1 = U - t 7 #V (ri n+1 ~ V n ~' 



719 



-U + V77 



n-l 



(32) 
(33) 
(34) 



These equations are equivalent to (21 — 22); they have the additional step of computing an auxiliary 
velocity U because it is required when the Coriolis term is implicit. 



Our goal in designing the reduced POP-a algorithm was to capture the effects of the LANS-a model, 
as represented by the full algorithm, but with as few additional computational steps as possible. The 
reduced algorithm, 



Reduced POP-« barotropic algorithm: 



V = V 



n-l 



+ r 



G" - BU n - 73V (rf 1 + 2r ] n - 1 )' 



U 



1 - a 2 V 2 
2 



-1 



V 

n+l 



ig-T 



v 



1 ;V n + V-H\—ij + Vrf 1 - 1 



19T 



719 



V 
U 



n+l 



V - rjgV [r] n+1 - 7] 



n+l 



U — T'jgV (rj 



jn+1 



V 



n-l 



(35) 
(36) 

(37) 

(38) 
(39) 



uses only a single smoothing step, which is in (36). In terms of equations, there are only two additions 
to the original POP implementation (32 — 34): the computation of the smooth auxiliary velocity in 
(36) and the final smooth velocity in (39). 



The reduced algorithm (35 — 39) is written with auxiliary velocities U, V because the POP code is 
structured in this way to accommodate an implicit Coriolis force. However, we are using an explicit 
Coriolis force, so the algorithm can be rewritten in an equivalent form without the auxiliary velocities 
as: 
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G n - BU" - # 7 V [rf 1 + 2r/ n " 1 



(40) 



U 



n+l 



1 - a 2 V 2 



V 



71-1 



+r ( 1 - a 2 V 2 



-37V (77 



,n+l 



G n - BU n - gi V [r] n + 2r] 



n-1 



V 



n+l 



V 



n-1 



G™ - BU n - 73V (r/ n+1 + rf 1 + rf" 1 ) 



(41) 
(42) 



In the standard POP implementation, the Coriolis term in the barotropic momentum equation may 
be chosen to be implicit or explicit using £' and 7' in (16). The POP-a algorithms presented here 
use an explicit Coriolis force. However, standard POP simulations typically use an implicit Coriolis 
force, which allows one to take a longer timestep (Table 2). Thus there is a motivation to implement 
POP-a with an implicit Coriolis term as well. Following Appendix C of [28], it can be shown that 
(35) should be replaced by 



I + r 7 B (l - ct 2 V 2 ) J (V - V"" 1 

G" - 7B (U n + 2U ?t - 1 ) - 73V (rf +1 + rf 1 + r] n ~ l 



(43) 



for an implicit Coriolis force. Unfortunately, solving this equation would be extremely slow since the 
inversion of (l + r 7 B(l — a V ) ) would require an iterative routine. In the standard POP code, 
this operator is simply (I + T7B), which is a 2x2 matrix and only takes a few operations to solve 
for each grid point. If an iterative routine, like a CG solver, is required to implement the POP-a 
momentum equation with implicit Coriolis, it would negate any efficiency gains due to the smaller 
time step. Thus POP-a was not implemented with an implicit Coriolis term. 

Fortunately, POP-a with explicit Coriolis runs stably using longer timesteps than standard POP with 
explicit Coriolis (Table 2). In fact, the maximum timestep with POP-a was found to be comparable 
to that with standard POP and implicit Coriolis. This is an important benefit of the LANS-a model, 
and makes the lack of an implicit Coriolis version a moot point. 



3.6 Unstable variations of the reduced POP-a algorithm 



A slightly altered version of the reduced algorithm was tried in which (37) was replaced with 
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- J r/ n+1 = \r) n 

19 T J 19 r 



+V • H + (l - a 2 V 2 ) 1 Vrf" 1 J . (44) 

The only change from (37) is that the V?]" -1 term is smoothed here. This version is closer to the full 
algorithm, as can be seen by substituting in the auxiliary velocity, 

[ V • HV 2 —A r] n+1 = 2 —7 1 n + — V -H(l- a 2 V 2 ) _1 V"" 1 

V jgr 1 igr z r^g v ' 



— V • H ( 1 - a 2 V 2X ~ 
91 v 



G n - BU n - 37 V (j] n + rj" 1 " 1 )] (45) 



That is, (45) is more similar to the full algorithm (28) (only one smoothing, on the LHS, is missing) 
than is the reduced algorithm (40). Despite being closer in appearance to the full algorithm, this 
version of the reduced algorithm was found to be unstable in practice, and is therefore not a viable 
option. This instability is not revealed by the linear stability analysis; the damping factor for this 
algorithm is nearly identical to the full POP-a algorithm. One possible explanation for this instability 
is that when the smoothing operator is removed from the LHS of the t] equation in the reduced 
algorithm, it must be partially removed from the RHS as well. 

Another alteration of the reduced POP-a algorithm is to replace (39) with 



U n+1 = (l - a 2 V 2 ) V n+1 . (46) 

Simulations where this method was employed were often unstable. Even though (46) follows directly 
from the LANS-a equations, it appears that (39)-which lacks the smoothing operation on the Vr/ 
terms-works better in practice. Again, this is probably because the reduced barotropic algorithm 
lacks pressure gradient smoothing in (37), and a corresponding lack of smoothing in (46) makes for 
a more stable algorithm. 

One of the difficulties encountered in devising the appropriate POP-a algorithm was to ensure that 
the velocity is nondivergent at each baroclinic level. The LANS-a equations specify that the smooth 
velocity u satisfies the continuity equation (5). This is a practical requirement as well; simulations 
where the rough velocity v is used in the free surface height equation (40) (which is derived from the 
continuity equation) were found to be unstable. 



4 Stability Analysis of the POP-a Barotropic Solver 



The stability analysis of the POP algorithm by Dukowicz and Smith [28] was used to find the 
best weighting for the implicit /explicit variables in (16-18), which are 7 = £ = £' = 1/3 and 
9 = 1. For gravity waves, this choice of parameters strongly damps the computational mode, and 
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slightly damps and slows down the the physical modes. For Rossby waves, the computational mode 
is slightly damped, and the physical mode is nearly undamped but slowed down. These results are 
reproduced here, in order to compare with the stability analysis of POP-a. The analysis is partly 
based on previous work by Wingate [21], who investigated the stability of the LANS-a shallow water 
equations by comparing a third-order Adams-Bashforth method to the Dukowicz and Smith [28] 
POP barotropic algorithm. 

The barotropic component of POP (and POP-a) compute the vertically integrated velocities and the 
free surface height, and is therefore the same as the shallow water equations. The dispersion relation 
for the continuous equations is derived by transforming the equations of motion into dimensionless 
form, combining them, and assuming a plane wave solution of the form e l ( k h-^-^t) f or remaining 
variable. This analysis appears in [28] and [21], and only the results are stated here. For the continuous 
shallow water equations, the dispersion relation for (external) gravity waves are 



for the shallow water equations (used by the barotropic component of standard POP) and 



= FHl + a^kl) (48) 

for LANS-a (used in POP-a), where k\ = k 2 + I 2 is the horizontal wave number, F = U / y/gH Q is 
the Froude number, g is gravitational acceleration, H Q is the average fluid depth, and U a typical 
velocity scale. The dispersion relation for Rossby waves is 

"•^ (49) 

for the shallow water equations and 



- k 2 (1 + a 2 k 2 ) + 1/B 2 [M) 

for LANS-a, where B = Rj L, R = \JgH j f Q is the Rossby deformation radius, / = fo + (3y is the 
Coriolis parameter, j3' = /3L 2 /U is the dimensionless beta parameter, (3 = d y f, and L is a typical 
length scale. For scales that are much larger than alpha (1/kh >> a), 1 + « 2 A;^ — > 1, and the 
LANS-a dispersion relations are identical to their Navier-Stokes counterparts. For scales near alpha, 
1 + a 2 k 2 ~ 2, so that both the gravity waves and Rossby waves are slowed down. 

We now investigate how varying a effectively changes the Rossby radius in these equations. First, 
define the Rossby wavenumber as k r = L/R, and note that the Rossby wavenumber maximizes the 
dispersion relation for the Navier-Stokes equation (49). This is clear in plots of the dispersion relation 
(see Fig. 2 in [21]). Analogously, we define an effective Rossby radius R* for the LANS-a model using 
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the wavenumber that maximizes (50). Solving du r /dk = for k and letting I = and k = L/R*, we 
produce a relationship between the Rossby radius R and the effective Rossby radius R* as a function 
of alpha: 




When a = (no alpha model) then R = R*, as expected. As a increases, the effective Rossby radius 
increases (Fig. 4). This is the mechanism by which POP-a allows more eddy activity than standard 
POP at scales just above the actual Rossby radius. 



4-1 Gravity waves 



In this section we conduct the stability analysis for gravity waves in the discrete barotropic equations 
for standard POP, full POP-a, and reduced POP-a. The notation of the derivation follows [21], 
section 4.2. The starting point is the POP-a barotropic equations, (23-25), which are then rescaled 
using a velocity scale U, a length scale L, a height scale Hq, and a time scale r. In the limit where 
the timescale r = L/U, the equations are 

5 n+l _ 5 n-l + = 0, (52) 

F 

v n+l - v n + At5 e = 0, (53) 
5 n+1 = (l-a 2 V 2 )"V +1 (54) 

where 5 = d x U\ + d y U2 is the divergence, and the superscripts £ and 9 have the same meaning 
as (17-18). The standard POP equations are recovered when a —> so that 8 — 8. Substituting 
5 n = X n e lkhX 5 and if 1 = \ n e tkhX f) into the above equations, we obtain the characteristic polynomial 

(A 2 - l) (A - 1) + 2 -^-PS = 0. (55) 
[28,21] where Q 2 = At 2 ^, Q is a nondimensional CFL number, and 

S = 6\ + 1 - 9. (56) 

The difference between POP, full POP-a, and reduced POP-a in this analysis is in the polynomial 
P, which comes from the V 2 r/^ term. For POP, 

P = £A 2 + (l-£-7)A + 7- (57) 
For the full POP-a algorithm, a Helmholtz inversion is applied to the full V?/ term in (28), so that 
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P = (l + a 2 |^| 2 ) 1 (£A 2 + (1 - £ - 7 )A + 7) ■ (58) 

For the reduced POP-a algorithm, the Helmholtz inversion is not applied to Vr/ n+1 and one Vr/"" 1 
in (37): 

P = (^A 2 + (l + « 2 |^| 2 )^ ((1 - £ - 7 )A + 2 7 ) - 7) • (59) 

The gravity wave amplification factors and dispersion errors for these algorithms are shown in Fig. 
3. These curves were computed numerically from the characteristic polynomial (55) using typical 
parameters, 8 = 1 and £ = 7 = 1/3. The Coriolis term does not appear in the divergence equation 
(52), so this analysis is valid for both implicit and explicit Coriolis schemes, and the parameters £' 
and 7' do not appear. Because the plots are shown with Q on the horizontal axis, a\kh\ is left as a 
free parameter and was chosen such that a ~ 

Fig. 3 shows that all schemes are stable to linear gravity waves, because all of the damping factors 
are less than one. Fig. 3a, for standard POP, is the basis of comparison, and is identical to Fig. 
2 in [28]. The POP-a reduced algorithm damps the physical gravity waves more strongly and the 
computation gravity waves less strongly than both POP and the full POP-a algorithm. 



4-2 Rossby waves 

To conduct a stability analysis of the Rossby waves, we begin with the POP-a equations in streamfunction- 
divergence form, 

Ro (V 2 ^ n+1 - V 2 ^"- 1 ) + 2At + ¥') = 0, (60) 

Ro (^ n+1 - + AtB 2 5 e = 0, (61) 

U? = -d v V n , UZ = d x ^ n , r] n = ^ n , (62) 

q n = (l- a 2 V 2 ) _1 * n . (63) 

([21], sections lb2 and 5b), where the Rossby number Ro = U/f Q L. Again, the standard POP 
equations are recovered when a — > 0, so that ^ = \l/ and 5 = 5. The free surface height 77 is equal to 
the smooth streamf unction ^ because of geostrophic balance, where the V?/ is balanced by the Coriolis 
force, which uses the smooth velocity. Geostrophic balance also requires that the implicit /explicit 
weighting of the Coriolis velocity (16) and pressure (17) are identical, so that £ = £' and 7 = 7' 
([28], section 3.3). This means that the explicit Coriolis formulation, where £' = 7' = 0, cannot be 
considered in this stability analysis, because for stability £ > 1/4 is required [28]. 

This system has a characteristic polynomial of 
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(l + a 2 \k h \ 2 ) B 2 k 2 h (A 2 - l) S + 2P [A - 1 - Atikf3'B 2 s] = (64) 
where S is defined in (56) and 

P = £A 2 +(l-£~7)A + 7- (65) 

In this Rossby wave analysis, the full and reduced POP-a algorithms have identical characteristic 
polynomials. That is because when one takes the curl of the momentum equation to get the vorticity 
equation (60), the V?7 term drops out, and the Vrj term is where the differences between the full and 
reduced POP-a algorithms appear. 

Here B 2 = R 2 /L 2 , so the alpha model is effectively making the Rossby Radius, R, larger in the 
first term of (64). This same effect was observed in the continuous equations (Fig. 4). The damping 
factors, using the typical choice of parameters (0 = l,£ = 7 = £ / = Y = l/3) are shown in Figure 5. 
Compared to standard POP, POP-a slows down the Rossby waves. The damping of Rossby waves 
is unaffected in this analysis. 



4-3 Full Stability Analysis 



A stability analysis of both gravity and Rossby waves may be conducted by beginning with the full 
beta-plane equations in vorticity-divergence, dimensionless form (following [28], equations 19 and 41) 

d t 5 - C + eU x + V 2 7] = 0, (66) 
d t C + 6 + eU 2 = 0, (67) 
dtv + 6 = (68) 

where the barred variables are smoothed, as in (54), and e = (3L/f . Discretizing and introduc- 
ing Fourier modes as before, one obtains a fifth-degree complex characteristic polynomial for the 
amplification factors: 

2|^| 4 At 2 (A 2 - l)PS - 4iek\k h \ 2 At 3 PP'S 
+ \k h \ 2 (X - 1)(A 2 - l) 2 + 4|^| 2 At 2 (A - 1)P' 2 

-2iekAt(X - 1)(A 2 - 1)P' - AielAt 2 {\ - 1)P' 2 = (69) 
where S is defined in (56), 

P' = ^A 2 + (l-r-7')A + 7 / (70) 

depends on the implicit /explicit Coriolis parameters, and P is defined as in (57-59) for POP and 
various POP-a algorithms. 
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Figure 6 shows that all versions of POP are stable when Atf a = 1/2 and \k\ > 1. These plots show 
instabilities when \k\ < 1, but this does not affect the overall stability. As explained in [28] for 
standard POP with explicit Coriolis terms, there is always a region where |A| > 1 for sufficiently 
small \k\. In practice, this instability is not an issue if At is chosen such that Atf Q < 1. This is true 
of the POP-a algorithms as well. 

Comparing the POP-a reduced algorithm with POP in Fig. 6, we see that the Rossby waves are not 
damped in either, the computational modes are less damped and the Poincare-wave modes are more 
damped in the POP-a reduced algorithm than in standard POP. 



5 Results 

Long-time simulations of POP, full POP-a, and reduced POP-ct were run in a channel-model domain. 
In this section, we show that the reduced POP-a algorithm produces results that are nearly identical 
to the full POP-a algorithm, but is much faster. For the POP-a simulations, smoothing was achieved 
by both a Helmholtz inversion, as prescribed by the LANS-a equations, and a simple filter that 
averages nearest neighbors. Using the filter to smooth also produces a substantial speed-up over the 
Helmholtz filter, which requires an iterative method; a comparison of smoothing methods is addressed 
in [25]. 

5.1 Description of the Model Problem 

The model problem can be thought of as an idealization of the Antarctic Circumpolar Current. The 
Circumpolar region is unique in the World Ocean in being zonally continuous, the only region of the 
ocean where there is no continent against which a zonal pressure gradient can be supported, and 
consequently the only region where meridional heat transport falls so heavily to the mesoscale eddies. 
A reentrant channel model therefore provides a relevant setting in which to consider the impact of 
a turbulence parameterization in an ocean general circulation model. Our test problem is based in 
part on the works of Karsten et al. [31] and Henning and Vallis [32]; the physical analysis in both 
of those works is more thorough than what we present, as our focus is on model development rather 
than physical oceanography. 

The zonally periodic model domain has solid boundaries to the north and south (Fig. 7). An eastward 
wind stress drives an eastward circulation in the channel. A deep-sea ridge between HE and 18E 
that is uniform from north to south forces the water column northward from 10-15°E and then 
southward again from 15-20°E by conservation of potential vorticity (Fig. 8) (see, e.g. [33] p. 100). 
This deflection of the mean flow spawns mesoscale eddies to the east of 18°E if the resolution is 
sufficiently high. For this study, POP was run in three resolutions, referred to as 0.8, 0.4, and 0.2 
to correspond with the longitudinal resolution, as shown in Table 1. The longitudinal resolution was 
chosen to have an aspect ratio of one at the central latitude of 60° south. At the lowest resolution, 
0.8, the Rossby Radius of deformation is not resolved, and so the velocity field does not contain 
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eddies. At the next higher resolution of 0.4 eddies form, and even finer and more numerous eddies 
can be seen in simulation 0.2. 

The model induces a surface thermal forcing by restoring the SST to a smooth temperature profile 
ranging from 2°C at 68°S to 12°C at 52°S. The thermal forcing in conjunction with the wind stress 
drive downwelling of warmer waters in the north and deep penetration of colder waters in the south, 
giving rise to the sloping isotherms seen in Figure 9. These tilted isotherms are a source of potential 
energy, driving baroclinic instability. The mesoscale eddies generated from this conversion of potential 
to kinetic energy tend to flatten the isotherms. 

The action of the eddies can be gauged through their effect on the temperature distribution. As 
the resolution of standard POP simulation is increased, mesoscale eddies are better resolved, and 
the isotherms are flatter. Figure 9 shows an important property of the LANS-a model: it allows 
more eddy activity. By capturing the effects of these eddies, lower resolution POP-a simulations also 
have flatter isotherms than standard POP at the same resolution. A global statistic that represents 
the tilting of isotherms is potential temperature averaged over the entire domain (Fig. 10a). With 
progressively higher resolution simulations of standard POP (0.8, 0.4, 0.2) the ocean cools faster 
and levels out to a cooler equilibrium. All simulations begin with a constant temperature of 7°C. 
The cooler global temperature of higher resolution simulations indicate that the isotherms are more 
level due to the eddies. Higher-resolution effects are also seen in the kinetic energy using POP-a: 
kinetic energy averaged over the domain increases with resolution (Fig. 10b); lower resolution POP-a 
simulations also capture this effect. 

Thus the POP-a results in this channel test problem are similar to those in other numerical sim- 
ulations of LANS-a: they produce turbulence statistics that resemble those from higher- resolution 
simulations without LANS-a [34,14,20]. Some of these effects, like the flattening of isotherms, must 
be due to the inclusion of LANS-a in the baroclinic component of POP. Based on previous work in 
barotropic LANS-a simulations [20], barotropic effects are most likely involved as well. 

5.2 Comparison of algorithms 

The purpose of this section is to show that the full and reduced POP-a algorithms produce nearly 
identical results. A qualitative comparison of temperature and velocity fields of simulation 0.4F after 
150 years shows that the dynamics are essentially the same (Figs. 8 and 12). Minor differences in 
the strength and location of eddies are due to the chaotic, time- varying nature of these structures. A 
more comprehensive comparison involves quantitative global statistics such as potential temperature 
and kinetic energy averaged over the domain. The difference between the full and reduced POP-a 
algorithms is less than 0.05% in potential temperature (Fig. 11a), and less than 1% in kinetic energy 
for low resolution experiments (Fig. lib). Differences of up to 20% in global kinetic energy for the 
0.4 cases are due to the high variability caused by transient eddies (Fig. 10b). 

The general goal of turbulence modeling is to produce results similar to higher resolution simulations 
without the computational cost of the higher resolution. Thus the running time of POP-a versus 
POP is a critical metric for the success of POP-a. The full POP-a algorithm using a Helmholtz 
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inversion is nearly as costly as a doubling of resolution (Fig. 13, Table 2); thus it is not a viable 
option for a turbulence model. The reduced algorithm is faster than the full algorithm. Switching 
from a Helmholtz inversion, which requires a CG iterative method for each smoothing step, to a 
simple filter that averages nearest neighbors makes it faster still. A full comparison of timing and 
performance of various smoothing methods is detailed in [25]. The point here is that an efficient 
model results from use of the reduced, rather than the full, algorithm, with nearly identical results. 

An important feature of the POP-a algorithm is that a longer time-step can be taken with it than 
with standard POP (Table 2), consistent with the findings of [21]. The reduced POP-a algorithm 
with a filter is actually faster than POP due to this relaxed time step restriction, so long as the 
comparison is restricted to cases making use of an explicit treatment of the Coriolis term. Even when 
POP is used with the advantageous implicit treatment of the Coriolis term the best POP-a algorithm 
only takes 6.5% longer than the best POP time (at 0.4 resolution). This compares to a doubling of 
resolution of standard POP that takes nine times as long. 

The efficiency of the LANS-a model depends on the value of a in the Helmholtz inversion, or 
equivalently the size of the averaging stencil for the filter. As a — > the LANS-a equations return 
to the Navier-Stokes equations; larger a makes the smooth velocity smoother, so that the LANS- 
a model has greater effect. In the results presented here, a = Ax, the width of one grid-cell. As a 
reaches a value in the range of 2 to 2.5Ax the global kinetic energy grows and the simulation becomes 
unstable. Analogously, simulations may become unstable with larger filter widths. This instability 
can be countered somewhat with higher viscosity, but a threshold of instability for large a still exists. 

We found that the reduced algorithm is less sensitive to this instability than the full algorithm. For 
example, in the 0.4 case, both full and reduced algorithms run stably for a = 1.5Aa;; when a = 2Ax 
the reduced algorithm is stable but the full algorithm is not; when a = 2.5Ax both algorithms are 
unstable. The filter produces similar results, where the reduced algorithm is stable for a wider range 
of filters than the full algorithm. This instability manifests itself in the barotropic solver, where the 
iterative CG routine does not converge. It is not surprising that the full algorithm is more sensitive to 
instabilities than the reduced algorithm in the barotropic solver; in the full algorithm, each iteration 
of the CG solver includes a smoothing step that requires a nested CG solver for the Helmholtz 
inversion. Generally, this type of nested iteration is poor algorithm design. 



6 Conclusions 

Implementation of the LANS-a turbulence parameterization in a primitive equation ocean model 
raises a number of new issues. The expediency of substituting a local smoothing filter for the global 
inverse Helmholtz operation has been previously established [30] but the appearance of the operator 
within the barotropic mode equation raises an additional challenge to efficient implementation. 

We have presented here the details of a fundamental implementation of LANS-a within a primitive 
equation ocean general circulation, with that form referred to here as the full POP-a algorithm. 
Results from the full algorithm have been shown to be consistent with those from previous geophysical 
studies of LANS-a in simpler models [20,21,22]. With this full algorithm as a point of reference we 
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have found an alternative implementation, which we refer to as the reduced POP-a algorithm and 
which produces nearly identical results; the step that is skipped in the reduced form was anticipated 
to have little impact due to the known tolerance for approximation in the barotropic set of equations. 
Either algorithm can be used with local filtering in place of the global Helmholtz inversion. 

The reduced form of the POP-a algorithm used in conjunction with local filtering in place of the 
much more costly global Helmholtz inversion produces a model which is only slightly more expensive 
than plain POP on a per-time-step basis. The longer time step, which is possible with inclusion of 
LANS-a, is countered in our experience by our inability to achieve a stable form of the algorithm 
with an implicit form of the Coriolis term; even so the overall increase in cost with use of our most 
efficient implementation is very small in comparison with the cost associated with a doubling of 
model resolution, making POP-a an attractive and potentially powerful option for ocean climate 
modeling. 

The linear stability analysis of the continuous equations and algorithm gives some insight into how 
the LANS-alpha model improves turbulence characteristics: the LANS-a model effectively makes 
the Rossby radius of deformation larger. Typical ocean-climate simulations either don't resolve or 
just barely resolve the Rossby radius; yet this scale is critical, as it is the size of ocean eddies and 
is the scale where kinetic energy forcing occurs due to the baroclinic instability. Because of the 
way the LANS-a model accounts for the effects of the small scale on the large, it generates an 
'effective' Rossby radius that is larger than in the unaveraged case, making the effects of baroclinic 
instability resolvable on coarser meshes than it could normally appear. One might object that a 
larger Rossby radius is an unphysical representation of the original primitive equation set. This 
would be a valid objection if those scales were well resolved in the first place. However, in most 
ocean-climate simulations today the Rossby radius is smaller than a grid-cell width or as large as a 
few grid-cells. Obviously, the turbulence and kinetic energy forcing cannot be well-represented when 
it is so completely underresolved. By making the effective Rossby radius larger, the LANS-a model 
circumvents this particular deficiency of low resolution, so that global statistics related to baroclinic 
instability more closely resemble those of higher resolution simulations. 

We expect the LANS-a turbulence parameterization to be particularly effective at an ocean model 
grid resolution that has been coarsened by approximately a factor of two to four, relative to the 
resolution required to bring out a vigorous mesoscale eddy field. So-called eddy-resolving ocean 
modeling, capturing such a vigorous eddy field, is known to require a grid resolution on the order 
of 0.1° [35]. Future work should address the question of the interaction of LANS-a with Gent- 
McWilliams-style isopycnal tracer mixing. If the two parameterizations are found to be compatible, as 
would seem likely, and if furthermore the oceanic jet systems such as the Gulf Stream/North Atlantic 
Current come into much more realistic focus with the inclusion of strong eddy transport of momentum 
at what would otherwise be non-eddy-resolving scale, then we enter a new regime of climate modeling 
with much more powerful modeling capabilities at our disposal, enabling long simulations with a more 
detailed and more faithful representation of the climate system. This statement is predicated on not 
one but two caveats. There is more work yet to be done, but we believe the results found thus far 
justify such work towards an ambitious goal. 
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Outline of algorithm - Standard POP 



Start: variables known 
at step n+1 

• temperature, salinity, 
density, pressure at n 

• baroclinic velocities u'l 

• barotropic velocityU" 



Baroclinic computations 

levels k = ]...km 
leap frog step: 

Ji- 



ll 



= uv + 2A tRHS? 



where RHS contains: 

• advection 

• metric (centrifugal) 

• Coriolis 

• diffusion 



J 



subtract depth average: 

j km 

~/i +1 X 1 ,,n+l j_ 

% = u k -—2j n k dz 

■EJ- i — i 



Barotropic computations 

implicit solve for the free 
surface height r} ' + 

new barotropic velocity: 

U" +1 =U n ~ ] +2AtRHS 

where RHS contains: 

■ vertically integrated forcing 

• Vq** 1 , Vrf, V?f +1 




add barotropic velocity 
back to baroclinic: 

~J!+I . TT"+1 



+ U' 



step n+1 complete 



Fig. 1. Schematic of POP algorithm, showing the baroclinic/barotropic splitting. 
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Outline of algorithm - (POP-alpha 



Start: variables known 
at step n+1 

■ temperature, salinity, 
density, pressure at n 



• baroclinic smooth and 
rough velocities U k ,v k 

• barotropic smooth and 

^ rough velocities U" , V" 




Baroclinic computations 

levels k = \...km 
leap frog step: 



+ 2AtRHS n k 



vr 1 = k 



where RHS contains: 

• advection 

• metric (centrifugal) 

• Coriolis 

• diffusion 



extra nonlinear term, Au T, v 



subtract depth average: 

1 km 



u' k l+l = smooth(\ n k +l ) 

t km 

11 t-i 




Barotropic computations 

implicit solve for the free 
surface height rj" + 

new barotropic velocity: 
V " +l = \"~ l +2AtRHS 

where RHS contains: 

• vertically integrated forcing 



compute U" +1 from V 



add barotropic velocity 
back to baroclinic: 



.11+1 



■n+l 



+ U 



— I! +1 



+ V 



« +1 



step n+1 complete 



Fig. 2. Schematic of POP-a algorithm, showing the baroclinic/barotropic splitting. 
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Fig. 3. Gravity wave stability analysis results for POP (a), POP-a full algorithm (b), and POP-a reduced 
algorithm (c) . Each plot shows the damping factors and phase speed as a function of the Courant-Friedrich- 
s-Lewy number kAtc g , and uses a = 1/kh, £ = 7= 1/3, and 6 = 1. 




Fig. 4. The effective Rossby radius of deformation, R*, as a function of a, where both are normalized 
by the non-alpha model Rossby Radius, R. This shows that the LANS-a model increases the effective 
Rossby Radius. This relation was produced using the dispersion relation for Rossby waves in the continuous 
shallow water LANS-a equations. The effective Rossby deformation radius includes the effects of the small, 
unresolved scales on the large. 
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Fig. 5. Rossby wave stability analysis results for POP (a) and POP-a (b) as a function of wave number k. 
For both algorithms, the physical Rossby wave is undamped. POP-a slows down Rossby waves, while POP 
does not. Here B 2 = 1/4 and a = 1/8 of the domain width. 
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Fig. 6. Damping factor for the discrete equations for purely zonal waves of the barotropic beta plane equations 
at midlatitudes (e = 3), using a time step of Atf = 1/2, a = 1/8 of the domain width, and typical POP 
parameters (£ = 7 = 1/3, 6 = 1) and explicit Coriolis terms (£' = 7' = 0). 
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Fig. 7. Schematic of the model domain, which has periodic zonal boundaries, solid north/south boundaries, 
a deep-sea ridge, surface wind forcing, and thermal forcing. 
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(a) Surface pot. temp., full algorithm 



(b) Surface pot. temp., reduced algorithm 





, \ \ — ; 



'//// 
■ / // 1 • 



(c) Pot. temp, at 1600m, full algorithm (d) Pot. temp, at 1600m, reduced algorithm 

Fig. 8. Snapshots of the rough velocity field v at 150 years for experiment 0.4F. A deep-sea ridge between 
11°E and 18°E causes northward and then southward flow, and spurs eddies east of 18°E. The full and 
reduced POP-o algorithm produce dynamics that are nearly identical. 
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Fig. 9. Depth of 6°C isotherm of potential temperature, averaged between 0°E and 10°E. Isotherms flatten 
with increasing resolution of standard POP (solid, dotted), due to the effects of mesoscale eddies. Simulations 
using POP-a (dash, dash-dot) have flatter isotherms than standard POP at the same resolution. 




Fig. 10. Global mean potential temperature (a) and kinetic energy (b) as a function of time using standard 
POP and POP-a. Higher resolution simulations (0.4 and 0.2) reach a cooler steady-state (a) and have higher 
kinetic energy (b) due to the activity of mesoscale eddies. POP-a simulations (0.8H, 0.4H) caputures the 
effects of these eddies at lower resolution that standard POP. This trend increases with larger a. 
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Fig. 11. Relative difference between the full POP-a algorithm and reduced POP-a algorithm for several 
experiments. The difference in the global mean potential temperature (a) is less than 0.05% in all cases; the 
difference in global mean kinetic energy (b) is generally less than 1% for lower resolution (0.8°) cases, but 
is larger for higher resolution due to the variability, as seen in Fig. 10b. Statistics are only available for the 
first 50 years of 0.4H because the full POP-a algorithm is so slow. 
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(a) Surface pot. temp., full algorithm (b) Surface pot. temp., reduced algorithm 




0" 4°E 8°E 12°E 16°E 20°E 24°E 28°E C° 4°E 8°E 12"E 16°E 20°E 24°E 28°E 



(c) Pot. temp, at 1600m, full algorithm (d) Pot. temp, at 1600m, reduced algorithm 

Fig. 12. Snapshots of potential temperature in °C at 150 years for full and reduced algorithm for experiment 
0.4F. This shows that the two algorithms produce nearly identical temperature fields. 
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POP, implicit Coriolis 
POP, explicit Coriolis 
POP-oc reduced Filter 
POP-oc reduced Helmholtz 
POP-oc full Filter 
POP-oc full Helmholtz 
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0.4 



resolution 

Fig. 13. Timing data from various implementations of POP and POP-a. All simulations that use the 
Helmholtz inversion to smooth are extremely slow. The POP-a algorithm can be sped up in two ways: 
using a filter, rather than a Helmholtz inversion to smooth; and changing from the full to the reduced 
algorithm. All POP-a simulations use explicit Coriolis discretization. Both implicit and explicit Coriolis 
simulations of POP are shown for comparison. 
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Table 1 

Model parameters for experiments discussed in this paper, where fw is the filter width; grid is the number 
of gridpoints in (cc, y, z); Ion is the longitudinal grid-cell width; and lat is the latitudinal grid-cell width. The 
names correspond to the meridional resolution and type of smoothing. All POP-a simulations were run with 
both the full and reduced algorithms. 



algorithm 


Cor 


steps / day 


clock time 






resolut 


ion 


resolution 








0.8 


0.4 


0.2 


0.8 


0.4 


0.2 


POP 


imp 


12 


22 


40 


4.16 


21.3 


195 


POP 


exp 


20 


32 


52 


5.82 


28.4 


212 


red. POP-a filter 


exp 


18 


24 




5.09 


22.7 




red. POP-a Helm. 


exp 


12 


18 




10.21 


33.0 




full POP-a filter 


exp 


16 


24 




5.30 


40.2 




full POP-a Helm. 


exp 


12 


18 




19.56 


97.8 





Table 2 

Minimimum steps/day and the resulting clock time for various algorithms and resoltions. The Cor column 
states whether the barotropic Coriolis term is implicit or exlpicit. Clock time is in processor-hours per 
simulated decade. The fastest POP-a algorithm is the reduced algorithm with a filter. Even though the 
POP-a algorithms use explicit Coriolis discretization, the timestep is smaller than standard POP with 
explicit Coriolis. 
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